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Abstract. Alternative splicing allows an organism to make different proteins 
in different cells at different times, all from the same gene. In a cell that 
uses alternative splicing, the total length of all the exons is much shorter than 
in a cell that encodes the same set of proteins without alternative splicing. 
This economical use of exons makes genes more stable during reproduction and 
development because a genome with a shorter exon length is more resistant to 
harmful mutations. Genomic stability may be the reason why higher vertebrates 
splice alternatively. For a broad class of alternatively spliced genes, a formula is 
given for the increase in their stability. 
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What is alternative splicing? A procaryote (no nucleus) transcribes one or more 
genes into mRNA and immediately translates the mRNA into protein. But a eucaryotc 
first transcribes a single gene into pre-mRNA, and then, using spliceosomes, turns the 
pre-mRNA into mRNA by splicing out most or all of its introns and often many of 
its exons. The eucaryote then exports the mRNA out of its nucleus into its cytosol, 
where its ribosomes translate the mRNA into protein. A eucaryote often can make 
different proteins from the same pre-mRNA transcript by splicing it in different ways. 
This trick is called alternative splicing. 

Why do higher vertebrates splice alternatively? Alternative splicing allows an 
organism to make different proteins in different cells at different times, all from the 
same gene, by poorly understood regulatory devices | Alberts et at, 2002] . But this 
diversity of proteins could also be produced by several different genes controlled by 
promoters and enhancers — in fact, that is how biologists thought genes worked 
until they discovered alternative splicing. The advantage of alternative splicing is 
that its economical use of exons makes genes more stable during reproduction and 
development. 

This communication gives a formula and a rule of thumb for the increase in the 
stability of a broad class of alternatively spliced genes. The DSCAM gene of the fruit 
fly and the cSlo gene of the chicken provide examples that illustrate the formula and 
the rule. A Monte Carlo simulation, displayed in Figure f , suggests how alternative 
splicing may help dividing human cells avoid excessive mutations. 

How does alternative splicing make genes more stable? Consider, for instance, 
a gene that has a long exon of 1000 base pairs (b) and two short ones, each 100 b 
long. The total length of its exons is 1200 b. Alternative splicing allows the cell to 
make two different RNAs, each of 1100 b. Without alternative splicing, the cell would 
need two genes, each 1100 b long, for a total exon length of 2200 b. Thus in this 
example, alternative splicing reduces the length of the exons in the DNA by 45%. 
This reduction in the length of exonic DNA implies a reduction of 45% in the error 
rate during the replication of this gene. In effect, the gene is nearly twice as stable 
due to alternative splicing. Since an error in the replication of critical exonic DNA is 
potentially lethal, this extra genomic stability is biologically significant and is one of 
the reasons why higher eucaryotes use alternative splicing. Computer scientists will 
recognize alternative splicing as akin to file compression |Ford, 20'oT| . 

More generally, let us consider a gene that has M groups of mutually exclusive 
exons in addition to the constitutively spliced exons (the exons that are always kept 
in the mRNA). For each group i (i — 1, . . . M), let Ni denote the number of mutually 
exclusive exons in the group, including the null exon of length zero if the spliceosome 
may splice out all the exons of the group. Assume that the spliceosome always selects 
at most one exon and no introns from each of the M groups of N exons with no 
shuffling. Assume that the organism expresses all 

M 

N=\{N 3 (1) 

i=i 

possible proteins at some time in some cell. 

Let us use L c for the total length in nucleotides of the constitutively spliced exons. 
Without alternative splicing, these L c nucleotides would be repeated in each of the N 
proteins for a total length of AT L c . 

If Lik is the length of exon k of group i, then without alternative splicing, each 
of the Ni exons of length Lik would be repeated AT/N times. So the total length 
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devoted to group i without alternative splicing would be 

Ni I M \ N, 



^L ik = \l[N 3 ;\Y^L ik . (2) 



Ni ^ 

k=l \ i=i / fc=l 

Thus the number of nucleotides that would be needed to encode all M proteins 
and that would have to be copied correctly each time a cell divides is 

/ m jv» \ 
N n&s =M \l c + J2jt EM ( 3 ) 

V t=i 1 fe=i / 

without alternative splicing. 

But with alternative splicing, the number of needed nucleotides is only the length 
of all the exons, 

M Ni 

n, s = l c +y,Y, l ^- ( 4 ) 

i=l k=l 

Since the error rate in the replication of DNA is 1CP 9 per base 
pair |Alberts et at, 2002| , the probability of an exonic error in the gene during replica- 
tion is N nas x 10 -9 without alternative splicing, but only N as x 10~ 9 with alternative 
splicing. So if we ignore the critical control sequences in the introns, then the ratio 

J=^p (5) 

is the increase in the stability of the gene due to alternative splicing. The intron 
control sequences probably boost / slightly. 

The DSCAM gene of Drosophila provides a striking example of alternative 
splicing. This gene encodes receptors that guide the growth of the axon of Bolwig's 
nerve in the fly embryo |Schmucker et a/. ^ 2000|^ It has M = 4 groups of N± = 12, 
N 2 = 48, 7V 3 = 33, and A^ 4 = 2 exons |Schmucker et a/., 20001 Pack, 2000| . The 
exons in each group are mutually exclusive, and the total number of possible proteins 
is TV = 12 x 48 x 33 x 2 = 38,016. The DSCAM gene, including introns, is 61.2 kb 
long, and its mRNA, after transcription and splicing, contains 24 exons and is 7.8 kb 
long |Schmucker et al, 2000| [Black, 2000| . 

The ratio iV nas /A^ as depends explicitly upon the lengths L c and Lik- Since most 
internal exons are between 50 and 300 nucleotides in length |Smith fc Valcarcel, 2000| , 
let us simplify the bookkeeping by setting L ik = 200 b. The spliced DSCAM mRNA 
is 7.8 kb long and contains 4 alternatively spliced exons and 20 constitutively spliced 
exons. So the set of constitutively spliced exons is of length 

L c = 7800 - 4 x 200 = 7000 (6) 

or L c = 7 kb. Thus by Eq.Q, the exonic length required with alternative splicing is 

4 

7^ as - L c + 200 ^ 

i 

= 7000 + 200 (12 + 48 + 33 + 2) = 26000 (7) 

or N as = 26 kb. But by Eq.©, the exonic length required without alternative splicing 
is 

N nas =N{L C + 200 M) 

= 38016 x 7800 = 296524800 (8) 
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Cell divisions 

Figure 1. After 46 cell divisions, the number of defects in the exons of a human 
diploid cell increases to 4.4 ± 0.07 with alternative splicing (lower curves, green) 
and to 22.0 ± 0.14 without alternative splicing (upper curves, red). 



or N nas = 297 Mb, which, incidentally, is nearly twice the length of the entire 
Drosophila genome and about six times the length of all the exons in the human 
genome. With these assumptions, the chance of a crucial error in the DSCAM gene 
during replication is 0.30 without alternative splicing, but only 2.6 x 10~ 5 with 
alternative splicing. The ratio / 

j = nas = 114()0 = L1 x 1Q 4 ,q\ 

is the increase in genetic stability due to alternative splicing. 

Fruit flies without alternative splicing would accumulate about 10,000 exonic 
DSCAM errors in 30,000 generations (2,500 years), and each fly would have its 
own set of 10,000 errors. Over this period, the DSCAM gene of the fly population 
gradually would become uniformly dysfunctional with relatively small differences in 
fitness among individual flics. On the other hand, flics with alternative splicing would 
accumulate less than one exonic DSCAM error in 30,000 generations. Moreover, the 
probability that the one error would occur in the L c exons that are constitutively 
expressed would be L c /N as = 7/26 = 0.27, and that unlucky fly would be distinctly 
unfit. Thus alternative splicing not only avoids exonic errors; it also helps natural 
selection weed out unfit individuals. Alternative splicing and natural selection 
cooperate to preserve the integrity of the genome. 

In most genes, the increase in genomic stability due to alternative splicing might 
be more like 5 or 10 than 10 4 , but even a 500% increase in genetic stability during 
reproduction and development is worth the trouble of alternative splicing. For if 
without alternative splicing the average gene were 5 times longer, then 7.5% rather 
then 1.5% of the genomes of higher vertebrates would consist of exons. The DNA 
of a human diploid cell has 6.4 billion base pairs. The error rate of 10~ 9 per 
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base pair implies that on average there will be 6.4 errors per cell division. With 
alternative splicing, only 1.5% of these errors occur in exons and are potentially 
deleterious, so the probability of a daughter cell with perfect exons is approximately 
P as = 1 — 6.4 x 0.015 = 0.904. A more accurate estimate is 

P as = (1 - 10^)0 "15X6.4X10° w e -0.096 = Q ggg (1Q) 

Without alternative splicing, 7.5% of the errors would occur in exons, and so the 
probability of a daughter cell with perfect exons would be roughly P nas = 1 — 6.4 x 
0.075 = 0.52. A more accurate estimate is 

P nas = (1 - l -9)°-"75x6.4xl0 9 „ e -0.48 = Q gig (U) 

The adult human arises from about 46 cell divisions, so the probability that any given 
adult cell has perfect exons is (P as ) 46 = 0.012 with alternative splicing, but only 
(-Pnas) 46 = 2.6 x 10 -10 without alternative splicing. 

To estimate the implications of alternative splicing for human evolution and 
development, I again assumed that the human genome without alternative splicing 
would have five times more exonic base pairs. I let two sets of 1000 cells divide 50 times 
in silico. The set of cells that used alternative splicing had 0.015 x 6.4 x 10 9 = 9.6 x 10 7 
exonic base pairs; the set that did not use alternative splicing had 5 times as many 
or 4.8 x 10 8 exonic base pairs. I divided the 1000 cells into 20 groups of 50 cells 
each and plotted in the figure the average number of exonic errors per cell for each 
of the 20 groups with and without alternative splicing. As shown in the figure, the 
average number of defective exonic base pairs per daughter cell after 46 cell divisions 
is 4.43 ± 0.07 with alternative splicing (lower, green lines) but 22.0 ± 0.14 without 
alternative splicing (upper, red lines). Since with alternative splicing, cells free of 
exonic error produce daughter cells that also are free of exonic error at a rate of 91%, 
apoptosis followed by division of adjacent cells can correct the 1 or 2 of the 4 exonic 
errors that are troublesome. But because without alternative splicing, cells free of 
exonic error produce daughter cells free of exonic error at a rate of only 62%, it is 
hard to see how apoptosis could cope with 22 exonic errors per adult cell. 

We may derive a rule of thumb for the increase in genetic stability by noting that 
(L s ) defined by 

M Ni 

= £ w £ Uk (12) 

»=i 1 k=i 

is an effective average length of the alternative exons that are spliced into the mRNA 
and that (N) defined by 

M Ni 

ww = EE^ ( 13 ) 

i=i k=i 

is a kind of average of the numbers iVj of alternative exons in the M groups. Let us 
further use r for the ratio of the average length (L s ) of the selected exons to the length 
L c of the constitutively spliced exons 

Then with these definitions, the increase I in genetic stability is 
___f =A A 1 + r 

N as l + (N)r K ' 



(14) 
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The fraction that multiplies the total number Af of possible proteins is less than 
unity. But it is generally not tiny because the ratio r usually is small and because (N) 
usually is less then 30. In the case of Drosophila DSCAM and with the assumptions 
L c = 7.0 kb and Ln~ — 200 b, the four selected exons are of length (L s ) = 800 b; the 
ratio r is r — 800/7000 = 0.114; and the effective average number (N) of exons per 
group is (N) = 95 x 200/(L s ) = 95/4 = 23.7. The fraction (l + r)/(l + (N) r) = 3/10, 
and the increase in genetic stability is I = 0.37V = 11, 400. 

Hearing in chickens provides another example of the contribution of alternative 
splicing to genetic stability. The cSlo gene of the chicken cochlea encodes the 
membrane proteins that form the Ca 2+ -activated K + channels that determine the 
resonant frequency of each hair cell in the basilar papilla. Alternative splicing 
provides some Af = 576 variants of the mRNA for each of the four components 
of this tetramer membrane protein |Rosenblatt et al., 19971 |Navaratnam et al., 1997| 
|Black, 1998] , resulting in a huge number possible resonant frequencies. In cSlo, the 
ratio r = 0.1, and the mean number (N) of exons in each of the eight groups is 
about 2.6 |Rosenblatt et al., 1997| . So by the rule of thumb JTBJ, alternative splicing 
increases the genetic stability of cSlo by a factor of about 

7 = 576— = 503. (16) 
1.26 v ' 

The tetrameric structure of the functional membrane protein effectively boosts I by 

another factor. 

Another example of alternative splicing's exonic economy is provided by 
the mammalian immune system, which uses site-specific genetic recombination in 
developing B cells |Alberts et al., 2002| . 

We have seen that the exonic economy of alternative splicing increases the stability 
of the genome. As genomics and proteomics advance, the protein-to-gene ratios of the 
higher vertebrates will teach us how much alternative splicing actually contributes to 
the stability of their genomes. 
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